- Part 1 - Dataset Enrichment with Zero-Shot Classification Models
- Part 2 - Dataset Enrichment with Zero-Shot Detection Models
- Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models
👍 Purpose This notebook shows how to enrich your image dataset using labels generated with open-source zero-shot image classification (or image tagging) models such as Recognize Anything (RAM) and Tag2Text. By the end of the notebook, you’ll learn how to:
- Install and load the RAM and Tag2Text models in fastdup.
- Enrich the your dataset using labels generated by RAM and Tag2Text model.
- Run inference using RAM and Tag2Text model on a single image.
Installation
First, let’s install the necessary packages:- fastdup - To analyze issues in the dataset.
- Recognize Anything - To use the RAM and Tag2Text model.
- gdown - To download demo data hosted on Google Drive.
🚧 CUDA Runtime fastdup runs perfectly on CPUs, but larger models like RAM and Tag2Text runs much slower on CPU compared to GPU. This codes in this notebook can be run on CPU or GPU. But, we highly recommend running in CUDA-enabled environment to reduce the run time. Running this notebook in Google Colab or Kaggle is a good start!
Download Dataset
Download the coco-minitrain dataset - A curated mini-training set consisting of 20% of COCO 2017 training dataset. Thecoco-minitrain consists of 25,000 images and annotations.
First, let’s load the dataset from the coco-minitrain dataset.
Inference with RAM and Tag2Text
Within fastdup you can readily use the zero-shot image tagging models such as Recognize Anything Model (RAM) and Tag2Text. Both Tag2Text and RAM exhibit strong recognition ability.- RAM is an image tagging model, which can recognize any common category with high accuracy. Outperforms CLIP and BLIP.
- Tag2Text is a vision-language model guided by tagging, which can support caption, retrieval, and tagging.
1. Inference on a bulk of images
To run inference on the downloaded dataset, you first need to load the image paths into aDataFrame.
DataFrame is as easy as:
📘 More onAs a result of runningfd.enrichEnriches an inputDataFrameby applying a specified model to perform a specific task. Currently supports the following parameters:
fd.enrich, an additional column 'ram_tags' is appended into the DataFrame listing all the relevant tags for the corresponding image.
Let’s plot the results of the enrichment to see the tags and captions given by the RAM and Tag2Text models.
2. Inference on a single image
We can use these models in fastdup in a few lines of code. Let’s suppose we’d like to run an inference on the following image.RecognizeAnythingModel and run an inference.
👍 Tip
As shown above, the model outputs all associated tags with the query image.
But what if you have a collection of images and would like to run zero-shot classification on all of them? fastdup provides a convenient fd.enrich API to for convenience.
Wrap Up
In this tutorial, we showed how you can run zero-shot image classification (or image tagging) models to enrich your dataset. This notebook is Part 1 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.- Part 1 - Dataset Enrichment with Zero-Shot Classification Models
- Part 2 - Dataset Enrichment with Zero-Shot Detection Models
- Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models
👍 Next Up Try out the Google Colab and Kaggle notebook to reproduce this example. Also, check out Part 2 of the series where we explore how to generate bounding boxes from the tags using zero-shot detection models like Grounding DINO. See you there!Questions about this tutorial? Reach out to us on our Slack channel!
VL Profiler - A faster and easier way to diagnose and visualize dataset issues
The team behind fastdup also recently launched VL Profiler, a no-code cloud-based platform that lets you leverage fastdup in the browser. VL Profiler lets you find:- Duplicates/near-duplicates.
- Outliers.
- Mislabels.
- Non-useful images.
👍 Free Usage Use VL Profiler for free to analyze issues on your dataset with up to 1,000,000 images. Get started for free.Not convinced yet? Interact with a collection of datasets like ImageNet-21K, COCO, and DeepFashion here. No sign-ups needed.